Estimating Entropy of Data Streams Using Compressed Counting
نویسنده
چکیده
The Shannon entropy is a widely used summary statistic, for example, network traffic measurement, anomaly detection, neural computations, spike trains, etc. This study focuses on estimating Shannon entropy of data streams. It is known that Shannon entropy can be approximated by Rényi entropy or Tsallis entropy, which are both functions of the αth frequency moments and approach Shannon entropy as α → 1. Compressed Counting (CC)[24] is a new method for approximating the αth frequency moments of data streams. Our contributions include:
منابع مشابه
A Very Efficient Scheme for Estimating Entropy of Data Streams Using Compressed Counting
Compressed Counting (CC) was recently proposed for approximating the αth frequency moments of data streams, for 0 < α ≤ 2. Under the relaxed strict-Turnstile model, CC dramatically improves the standard algorithm based on symmetric stable random projections, especially as α → 1. A direct application of CC is to estimate the entropy, which is an important summary statistic in Web/network measure...
متن کاملImproving Compressed Counting
Compressed Counting (CC) [22] was recently proposed for estimating the αth frequency moments of data streams, where 0 < α ≤ 2. CC can be used for estimating Shannon entropy, which can be approximated by certain functions of the αth frequency moments as α → 1. Monitoring Shannon entropy for anomaly detection (e.g., DDoS attacks) in large networks is an important task. This paper presents a new a...
متن کاملEntropy Estimations Using Correlated Symmetric Stable Random Projections
Methods for efficiently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is...
متن کاملOn the Sample Complexity of Compressed Counting
The problem of “scaling up for high dimensional data and high speed data streams” is among the “ten challenging problems in data mining research”[36]. This paper is devoted to estimating entropy of data streams. Mining data streams[19, 4, 1, 29] in (e.g.,) 100 TB scale databases has become an important area of research, e.g., [10, 1], as network data can easily reach that scale[36]. Search engi...
متن کاملA New Algorithm for Compressed Counting with Applications in Shannon Entropy Estimation in Dynamic Data
Efficient estimation of the moments and Shannon entropy of data streams is an important task in modern machine learning and data mining. To estimate the Shannon entropy, it suffices to accurately estimate the α-th moment with ∆ = |1 − α| ≈ 0. To guarantee that the error of estimated Shannon entropy is within a ν-additive factor, the method of symmetric stable random projections requires O ( 1 ν...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/0910.1495 شماره
صفحات -
تاریخ انتشار 2009